Patchwork Kriging for Large-scale Gaussian Process Regression
نویسندگان
چکیده
This paper presents a new approach for Gaussian process (GP) regression for large datasets. The approach involves partitioning the regression input domain into multiple local regions with a different local GP model fitted in each region. Unlike existing local partitioned GP approaches, we introduce a technique for patching together the local GP models nearly seamlessly to ensure that the local GP models for two neighboring regions produce nearly the same response prediction and prediction error variance on the boundary between the two regions. This effectively solves the well-known discontinuity problem that degrades the boundary accuracy of existing local partitioned GP methods. Our main innovation is to represent the continuity conditions as additional pseudo-observations that the differences between neighboring GP responses are identically zero at an appropriately chosen set of boundary input locations. To predict the response at any input location, we simply augment the actual response observations with the pseudo-observations and apply standard GP prediction methods to the augmented data. In contrast to heuristic continuity adjustments, this has an advantage of working within a formal GP framework, so that the GP-based predictive uncertainty quantification remains valid. Our approach also inherits a sparse block-like structure for the sample covariance matrix, which results in computationally efficient closed-form expressions for the predictive mean and variance. In addition, we provide a new spatial partitioning scheme based on a recursive space partitioning along local principal component directions, which makes the proposed approach applicable for regression domains having more than two dimensions. Using three spatial datasets and three higher dimensional datasets, we investigate the numerical performance of the approach and compare it to several state-of-the-art approaches.
منابع مشابه
Hierarchical Mixture-of-Experts Model for Large-Scale Gaussian Process Regression
We propose a practical and scalable Gaussian process model for large-scale nonlinear probabilistic regression. Our mixture-of-experts model is conceptually simple and hierarchically recombines computations for an overall approximation of a full Gaussian process. Closed-form and distributed computations allow for efficient and massive parallelisation while keeping the memory consumption small. G...
متن کاملBeyond Classification – Large-scale Gaussian Process Inference and Uncertainty Prediction
Due to the massive (labeled) data available on the web, a tremendous interest in large-scale machine learning methods has emerged in the last years. Whereas, most of the work done in this new area of research focused on fast and efficient classification algorithms, we show in this paper how other aspects of learning can also be covered using massive datasets. The paper1 briefly presents techniq...
متن کاملFast Gaussian Process Regression using KD-Trees
The computation required for Gaussian process regression with n training examples is about O(n) during training and O(n) for each prediction. This makes Gaussian process regression too slow for large datasets. In this paper, we present a fast approximation method, based on kd-trees, that significantly reduces both the prediction and the training times of Gaussian process regression.
متن کاملlaGP: Large-Scale Spatial Modeling via Local Approximate Gaussian Processes in R
Gaussian process (GP) regression models make for powerful predictors in out of sample exercises, but cubic runtimes for dense matrix decompositions severely limits the size of data—training and testing—on which they can be deployed. That means that in computer experiment, spatial/geo-physical, and machine learning contexts, GPs no longer enjoy privledged status as modern data sets continue ball...
متن کاملKNN-based Kalman filter: An efficient and non-stationary method for Gaussian process regression
The traditional Gaussian process (GP) regression is often deteriorated when the data set is large-scale and/or non-stationary. To address these challenging data properties, we propose a K-Nearest-Neighbor-based Kalman filter for Gaussian process regression (KNN-KFGP). Firstly, we design a test-inputdriven KNN mechanism to group the training set into a number of small collections. Secondly, we u...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1701.06655 شماره
صفحات -
تاریخ انتشار 2017